-
Notifications
You must be signed in to change notification settings - Fork 226
[PROTOTYPE] Use l4t-cuda base for 31.5% smaller Jetson 6.2.0 images #1714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Instead of downgrading rasterio, compile GDAL 3.8.5 from source to meet rasterio 1.4.0's requirement for GDAL >= 3.5. Changes: - Compile GDAL 3.8.5 from source in builder stage - Copy GDAL libraries and data to runtime stage - Install required GDAL dependencies - Set GDAL environment variables (GDAL_CONFIG, GDAL_DATA, LD_LIBRARY_PATH) This provides a forward-compatible solution while maintaining compatibility with rasterio 1.4.0 and keeping packages up to date. Jetpack r36.4.0 ships with GDAL 3.4.1, which is incompatible with rasterio 1.4.x (requires >= 3.5). Building from source solves this.
Runtime stage only needs the runtime libraries to run GDAL, not the development headers and static libraries. This reduces image size. Changed from: - libproj-dev → libproj25 - libsqlite3-dev → libsqlite3-0 - libtiff-dev → libtiff5 - libcurl4-openssl-dev → libcurl4 - etc. Builder stage keeps -dev packages (needed for compilation).
Changed GDAL build from Make to Ninja for faster parallel compilation: - Added -GNinja to cmake to generate Ninja build files - Use 'ninja' instead of 'make -j$(nproc)' - Use 'ninja install' instead of 'make install' Ninja is faster and more efficient for parallel builds. ninja-build package is already installed in dependencies.
Updated from 3.8.5 to 3.11.5 (latest as of Nov 4, 2025). Benefits of 3.11.x: - Latest bug fixes and security updates - Improved performance - New format support - Still meets rasterio 1.4.0 requirement (GDAL >= 3.5)
Changed libproj25 to libproj22 (correct package name for Ubuntu 22.04 Jammy). Build was failing with: E: Unable to locate package libproj25
- Bump pylogix from 1.0.5 to 1.1.3 (latest version) - Add file package to both builder and runtime stages - file command is required by Arena API for binary architecture validation This ensures compatibility with the latest pylogix version and enables proper Arena SDK functionality.
The Dockerfile incorrectly specified torch>=2.8.0 which doesn't exist, causing pip to fall back to CPU-only PyTorch from PyPI instead of using the GPU-enabled version from jetson-ai-lab.io. Changed to torch>=2.0.1,<2.7.0 to match requirements.sam.txt and ensure GPU-enabled PyTorch is installed from the Jetson AI Lab index. This fixes the critical bug where the container had no PyTorch GPU support.
This fulfills the TODO in requirements.sam.txt to update to PyTorch 2.8.0 now that pre-built flash-attn is available on jetson-ai-lab.io. Changes: - PyTorch: >=2.0.1,<2.7.0 → >=2.8.0 - torchvision: >=0.15.2 → >=0.23.0 (latest) - Added: flash-attn>=2.8.2 for SAM2 support This enables full GPU acceleration for SAM2 and other transformer models with flash attention support on Jetson Orin.
Updated requirements.transformers.txt to match requirements.sam.txt: - torch: >=2.0.1,<2.7.0 → >=2.8.0 - torchvision: >=0.15.0 → >=0.23.0 - Added: flash-attn>=2.8.2 This resolves the dependency conflict that was causing builds to fail.
…icts - Remove libgdal-dev from apt-get to prevent conflict with compiled GDAL 3.11.5 - Add GDAL version verification to ensure correct version is available - Pin flash-attn to 2.8.2 to match pre-built wheel on jetson-ai-lab.io Fixes GDAL version detection issue where rasterio was finding system GDAL 3.4.1 instead of compiled 3.11.5, and flash-attn build failures when uv tried to compile 2.8.3 from source.
PyTorch 2.8.0 from jetson-ai-lab.io was compiled with NumPy 1.x and crashes when NumPy 2.x is installed. This adds a Jetson-specific constraint to ensure NumPy 1.26.x is used instead of 2.x.
- Changed _requirements.txt to allow NumPy 1.26+ instead of requiring 2.0+ - requirements.jetson.txt enforces NumPy <2.0 for PyTorch 2.8.0 compatibility - This allows Jetson builds to use NumPy 1.x while other builds can use 2.x
The numpy package is already specified in _requirements.txt and requirements.jetson.txt with proper version constraints. Having it as a standalone argument causes uv to try to install the latest version (2.x) which conflicts with the Jetson requirement of <2.0.0.
The previous build used cached requirements files with the old NumPy constraint. This comment forces Docker to invalidate the cache and copy the updated requirements files with numpy>=1.26.0,<2.3.0.
Changes: - requirements.sdk.http.txt: Change numpy>=2.0.0,<2.3.0 to numpy>=1.26.0,<2.3.0 - Dockerfile.onnx.jetson.6.2.0: Add ARG CACHE_BUST to force cache invalidation This resolves the unsatisfiable dependency conflict where PyTorch 2.8.0 from jetson-ai-lab.io requires NumPy 1.x but requirements.sdk.http.txt specified NumPy 2.x.
This prototype explores using nvcr.io/nvidia/l4t-cuda:12.6.11-runtime as the base image instead of the full l4t-jetpack:r36.4.0 stack. Benefits: - Smaller base image (CUDA runtime vs full JetPack) - No pre-installed package conflicts - Full control over dependency versions - Cleaner dependency management Comparison: - Base: l4t-cuda:12.6.11-runtime vs l4t-jetpack:r36.4.0 - CUDA: 12.6.11 vs 12.2 - Same: PyTorch 2.8.0, GDAL 3.11.5, Python dependencies See docker/BUILD_COMPARISON.md for detailed comparison methodology.
- Added curl to builder stage packages (needed for uv installer) - Added uv --version verification step to ensure installation succeeds
Changes: - Added cudnn-source stage to extract cuDNN from l4t-jetpack:r36.4.0 - Copy cuDNN libraries to /usr/local/cuda/lib64/ in runtime stage - Copy cuDNN headers to /usr/local/cuda/include/ - Update LD_LIBRARY_PATH to include /usr/local/cuda/lib64 - Update label to document cuDNN source This fixes PyTorch import error (libcudnn.so.9 missing) while maintaining the 63% size reduction compared to full l4t-jetpack base.
PyTorch requires libcupti.so.12 and libnvToolsExt which are not included in the l4t-cuda base image. Copy these from the jetpack source stage.
Results: - Image size: 8.28 GB (41.7% smaller than 14.2 GB jetpack version) - Build time: ~10 minutes on Jetson Orin MAXN mode - All components verified working (PyTorch, CUDA, cuDNN, GPU) Recommendation: Adopt l4t-cuda base for production use.
Critical missing dependency for inference server to work. Changes: - Build onnxruntime 1.20.0 from source with CUDA 12.6 and TensorRT - Copy TensorRT libraries from jetpack to builder and runtime stages - Use parallel=12 for faster compilation on MAXN mode This enables full ONNX model support with GPU acceleration.
The onnxruntime build script imports PyTorch which requires libcupti.so.12. Copy these libs to builder stage and set LD_LIBRARY_PATH before building.
onnxruntime compilation requires nvcc and CUDA development tools which are only available in the -devel image, not -runtime.
Architecture change: - Stage 1 (Builder): l4t-jetpack:r36.4.0 - Has nvcc, CUDA 12.6, cuDNN, TensorRT for compilation - Compile GDAL, onnxruntime, install all Python packages - Stage 2 (Runtime): l4t-cuda:12.6.11-runtime - Minimal CUDA runtime (THIS determines final image size) - Copy compiled binaries + libraries from builder - Copy cuDNN/TensorRT libs from builder This is cleaner than 3-stage build and matches how existing Dockerfile works.
The gpu_http.py script defines the app but doesn't run uvicorn. Change entrypoint to match working jetpack version.
uvicorn is installed as a Python module but not in PATH as a standalone executable.
Build and install inference packages (core, gpu, cli, sdk) to provide the 'inference' command-line tool for benchmarking.
The inference command is installed in /usr/local/bin by the inference-cli package and needs to be copied to the runtime image.
The inference CLI script uses #\!/usr/bin/python shebang which requires a python symlink to python3.
Enable TensorRT FP16, engine caching, OpenBLAS ARM optimization, and increase concurrent workflow steps to match jetpack configuration.
Collaborator
|
I want to raise a veto for |
Contributor
Author
|
Superseded by #1718 which has a cleaner implementation and better benchmark results (41.7% size reduction vs 31.5%). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Current Jetson 6.2.0 Docker image using
l4t-jetpack:r36.4.0base is 14.2 GB, which:Solution
Use minimal
l4t-cuda:12.6.11-runtimebase image for final runtime stage while keepingl4t-jetpackonly for compilation.This prototype demonstrates a 2-stage build approach:
l4t-jetpack:r36.4.0- Has nvcc, CUDA dev tools, cuDNN, TensorRT for compilationl4t-cuda:12.6.11-runtime- Minimal CUDA runtime determining final image sizeBenefits
Testing
E2E Testing on Jetson Orin
Performance Comparison
Files Changed
docker/dockerfiles/Dockerfile.onnx.jetson.6.2.0.cuda-base- New prototype Dockerfiledocker/BUILD_COMPARISON.md- Detailed comparison documentationMigration Path
This is a PROTOTYPE for evaluation. See
docker/BUILD_COMPARISON.mdfor full analysis.Related
Addresses #1695 and ongoing Jetson image size concerns.